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Effects of Treatment on Disruptive Behaviors: A Quantitative 
Synthesis of Single-Subject Researches Using the PEM Approach 

Chiu-Wen Chen & Hsen-Hsing Ma 

The present study uses the PEM approach to synthesize the effectiveness of treatment on disruptive 
behaviors and simultaneously tests whether the higher validity of the PEM approach than that of the 
PND approach is repeatable. A hand search of the Journal of Applied Behavior Analysis was 
conducted, and reference lists from reviewed articles were traced to locate relevant studies. 

Altogether, 106 single-subject studies, which produced 694 effect sizes, were analyzed. The grand 
mean of 106 averaged effect sizes was significant. Results demonstrated that the PEM approach was 
more congruent with the original authors’ judgments than the PND approach. Important findings 
regarding the effectiveness of interventions on the disruptive behaviors are that the strategies of 
differential reinforcement and the token economy system along with multi-components intervention 
were highly effective. 

Key words: PEM (percentage of data points exceeding the median of baseline phase); PND 
(percentage of non-overlapping data); Disruptive behavior; Quantitative synthesis (Meta-analysis) 
of single-subject research 


Meta-analysis provides a quantitative method to reach a certain conclusion hy integrating relevant 
studies on a theoretical issue. Because the data of successive measurements over time in single-case 
experimental designs usually violate the assumptions of parametric statistics, especially that of 
homogeneity and independence of residuals, it is not appropriate to adopt methods used in conventional 
meta-analysis for hetween-group research. Mastropieri and Scruggs (1985-86) took a nonparametric 
approach, percentage of non-overlapping data (PND), to calculate the effect size for intra-suhject research. 
Ma (2006) discussed the advantages and drawbacks of the PND and proposed an alternative method, the 
percentage of data points exceeding the median (PEM), to improve the shortcomings of the PND. Using 
original authors’ judgment, i,e, the judgment of the author(s) of each located study, on the effectiveness of 
treatment as a validity criterion, the PEM approach had demonstrated a higher Spearman correlation with 
original authors judgment than the PND approach did. And this result was confirmed hy Gao and Ma 
(2006). 

Scruggs et al. (1986, p.262) suggested a criterion to evaluate the effectiveness of treatment according to 
the PND scores: (a) highly effective when the score is above .90, (b) moderately effective when the score is 
between .70 and .90, (c) mildly (or questionable) effective in cases with scores between .50 and .70, and 
lastly, (d) ineffective, when the score is below .50. However, during coding, it was difficult to differentiate 
between moderately and mildly or questionable in the visual judgment of the effectiveness of treatment 
based on the curve change in the baseline and treatment phases. Ma (2006) and Gao and Ma (2006) 
incorporated “mildly (or questionable) effective” into “not effective.” Hence these categories of 
effectiveness were coded: (a) highly effective, which was coded as 2, with core higher than .90, (b) 
moderately effective (coded as 1) with score equal to or higher than .70 but lower than .90, and (c) 
questionable or not effective (coded as 0) with score lower than .70. In the present study the authors try to 
incorporate “mildly (or questionable) effective” into “moderately effective” to form three categories: (a) 
highly effective with score above .90 (coded as 2), (b) partially effective including moderately and mildly 
(or questionable) effective (coded as 1) with score equal or greater than .50 but less than .90, and (c) 
ineffective (coded as 0) score less than .50, and to examine whether the superiority of validity of PEM 
approach over PND approach still sustains. 

Although some disruptive behaviors are not life threatening or excessively severe, they are considered 
to be problematic by participants’ teachers, parents, caregivers and dentists since disruptive behaviors 
prevent participant participation in instructional activities, family routines, and high-quality dental 
treatment. In the present synthesis, disruptive behavior is defined as “An excessive behavior that can 
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interfere with the general activities proceeding at the time.” 

Disruptive behavior is a common problem in educational settings. Scholars have noted that disruptive 
behavior is closely related to less academic engagement, low grades, and a poor performance on 
standardized tests (Bailey, Wolf, & Philips, 1970, p.223; Stage & Quiroz, 1997, p.333). Moreover, Ramp, 
Ulrich, and Dulaney (1971, p.235) indicated that many teachers have had to devise their classroom 
management techniques through experience, because public education has long lacked effective principles 
to aid teachers. Therefore, the development and implementation of effective and acceptable interventions 
for students who exhibit disruption in schools is an important educational problem (Wilkinson, 1997). 

The present study uses the PEM approach to synthesize the effectiveness of treatment of disruptive 
behaviors and simultaneously tests whether the higher validity of the PEM approach than that of the PND 
approach is repeatable. 


Procedures for Locating Studies 


Method 


Studies in this synthesis were acquired according to the following steps. A hand search of a major 
journal in the field, the Journal of Applied Behavior Analysis, was conducted. Descriptors included 
aggressive behavior; inappropriate behavior; noncompliant behavior; destructive behavior; disruptive 
behavior(s); off-task behavior; self-injurious behavior; self-stimulatory behavior; uncooperative behavior; 
and problem behavior. Then, reference lists from reviewed articles and bibliographies from individual 
research reports were examined. Studies including administration of medication were rejected (e.g., Blum, 
Mauk, McComas, & Mace, 1996). All studies that met the following criteria were selected for this 
synthesis: 

1. The objective of the study was the reduction of disruptive behavior; those studies investigating training 
or instructional procedures were included only in cases identifying that the explicit purpose of the 
procedures was to decrease an excess behavior. 

2. A single-subject research design was employed. 

3. Baseline and treatment phases of reversal or multiple-baseline design were recorded using a time series 
graphic display for individual participants. 

Eor studies meeting the above criteria, the PND and PEM procedures were employed to compute each 
effect size of baseline-treatment phases. 

Procedures for Coding Each Study 

Categorization of Disruptive Behaviors. The operational definition of disruptive behavior used in the 
present study was adopted form Thomas, Becker, and Armstrong (1968) who classified the disruptive 
behavior into five categories: (a) gross motor activities including fiddling, jerking, and out of seat; (b) 
non-verbal noise-making; (c) orienting including off-task; (d) verbalization including crying, inappropriate 
verbalization, and talk outs; and (d) verbal or physical aggression. All behaviors in each category were 
considered incompatible with good classroom learning. 

Categorization of Treatment Procedures. Over the last few decades, numerous behaviorists have 
developed treatment techniques designed to reduce disruptive behaviors and have also developed training 
programs to teach caregivers in a variety of settings. The categorization of treatments is briefly described 
below. 

1. Differential reinforcement of other appropriate behaviors: A procedure involving provision of positive 
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reinforcement contingent upon the absence of a disruptive behavior and the presence of desirable 
behaviors during a specified time interval (e.g. Stage et al., 1997). This category includes differential 
reinforcement of other behavior/behavior omission (DRO), differential reinforcement of alternative 
behavior (DRA), differential reinforcement of incompatible behavior (DRI), and differential 
reinforcement of low rates of response (DRL). The disruptive behaviors were ignored and appropriate 
behaviors were reinforced with tangible reinforcers, such as free time, an interesting activity, play-time, 
edible reinforcers, social reinforcers, or multiple reinforcers. 

2. Token economy system: The system enables a child to earn points or tokens for his/her appropriate 
behaviors. Points or tokens can then be exchanged for a wide variety of activities, privileges, or 
priorities (e.g. Ayllon & Roberts, 1974). 

3. Response cost: A procedure involving the withholding of previously given tokens, points, or their 
equivalents when a participant emits a disruptive behavior. 

4. Token economy system plus response cost: tokens are taken back from participants for rule violation 
and participants can receive additional opportunities for regain tokens. 

5. Punishment: Use a negative behavioral consequence to ameliorate disruptive behaviors, such as time 
out from interesting activities or a meal, over-correction, and nagging. 

6. Providing preferential tasks: Revising instructional conditions to decrease disruptive behaviors, e.g. 
providing activity choice, choice of task sequence, interesting assignment, outside-reading, or 
decreasing difficulty of task. 

7. Instruction or training: Teach communication, self-control, and other adaptive skills to guide the 
participant to eliminate disruptive behaviors. These instructional strategies include self-management 
training, social skill training, problem-solving training, functional communicative training, compliance 
training, cognitive-behavior training, behavior feedback, oral instruction and written instruction. 

8. Multi-components intervention: Integrating two or more treatments into one treatment package, such as 
DRA and over-correction, DRO and over-correction, functional communication training plus token 
economy system, increased attention and timeout, social skill training and parent involvement, DRL 
plus functional communicative training plus timeout. 

Original author’s conclusion of overall effectiveness of treatment. Conclusions were assigned an 
outcome rating of 2 (effective), 1 (partially effective, including moderately and mildly or questionable 
effective), or 0 (ineffective). 

Settings. Intervention settings were classified as a classroom; dental clinic room; institute (e.g., 
day-program at public residential facility); therapy room including room for experiment, therapy or training; 
home; and other places such as bus, library, fast-food restaurant, playground, and school hallway. 

Interveners. This category includes school staff (including librarians, school bus drivers, teachers, and 
teaching assistants), psychology professionals (including clinical psychology graduate students, 
professional therapists, school psychologists, speech therapists, and research assistants), caregivers 
(including attendants, parents, and caregivers), dental staff (including dentists and dental nurses), and 
composite staff (including teachers and researchers, parents and clinicians, etc). 

Participant classifications. Participants in the present study were classified as regular students 
(including those who perform in the average range on intelligence tests and do not participate in any 
remedial education program), students with attention deficit hyperactivity disorder, students with 
developmental or mental retardation, students with emotionally disturbed behavior, students with autism, 
students with behavior problems, students with brain damage, students with language delay, students with 
learning disability, pre-delinquent students, students with developmental or mental retardation and autism, 
students with developmental or mental retardation and behavior problems, and students with other 
diagnoses. 

Participant age. Age was divided into six groups: Primary elementary school children (i.e., 
kindergarten to third grade or younger than 9 years old), upper elementary school children (fourth grade to 
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sixth grade or 10 to 12 years old), junior high school (seventh grade to ninth grade or 13 to ISyears old), 
senior high school students (tenth to twelfth grade or 16 to 18 years old), adult (over 18 years old), and 
composite (several grade levels comhined). 

Experimental design. Experimental designs were classified as reversal design, multiple-haseline design, 
reversal plus multiple-haseline design and other designs. 

Computation of treatment outcomes 

Treatment outcome of each pair of haseline-treatment phases was calculated hy the PEM and PND 
methods. Steps in computing the PEM scores are descrihed as follows (Ma, 2006): first, a horizontal median 
line is drawn in the baseline phase. This horizontal median line will hit the median point when the number 
of data points in the baseline phase is odd; and go between the two median points if the number of data 
points is even. Second, the median line stretches out horizontally to the treatment phase, and the percentage 
of data points of treatment phase above the median line may be calculated to obtain a PEM score. The 
percentage of data points of a treatment phase below the median line may be calculated if the undesired 
behavior is expected to decrease after the specific treatment is given. The null hypothesis of the PEM 
approach is that if the treatment has no effect, the data points in the treatment phase will fluctuate up and 
down around the median line. That is to say, the data points have a 50% probability of appearing above and 
50% for appearing below the median line. The PEM score ranges from 0 to 1. The meaning of a PEM score 
is the same as an effect size. If there is more than one effect size in an article, they may be averaged to form 
a mean effect size for that article. 

The PND is the percentage of data points in the treatment phase over the highest point of the 
distribution in the baseline phase (or below the lowest point of data points in the baseline phase if the 
undesirable behavior is expected to decrease after the intervention is introduced). 

Treatment generalization and follow-up phases were not included in the present analysis. 

Computation of orthogonal slope change and floor effect. The proportion of orthogonal slope change 
and that of floor effect of each pair of baseline-treatment phases were computed in order to examine 
whether orthogonal slope change or floor effect would influence the computations of the PND and PEM 
scores. 

Reliability. The reliability of the coding procedure was established by a separate independent rater 
rating a random sample of 30% of total coding. Disagreement was resolved by discussion. 

Results 


Reliability 

Inter-rater agreement was calculated on a random sample of 30% (n=32) total coding. Two doctoral 
students majoring in educational psychology served as raters for coding. One rater aided the first author of 
the present study in compilation of outcome ratings for the original authors’ judgments, the PND scores, 
and the PEM scores. The agreement was 95.06% for the original authors’ judgments, 84.77% for PND 
scores, and 88.48% for PEM scores. The second rater classified independent variables and dependent 
variables. Inter-rater agreement for all studies sampled was 93.01%. Disagreements were reassessed and 
resolved in discussion. If the rating was contradictory between the rater and the first author, the first author 
made the final decision. 
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Analysis of the Validity of the PND and PEM Approaches 

A total of 106 single-subject studies on treatments of disruptive behavior were analyzed. Table 1 
shows that both the PND scores and PEM scores significantly correlated with the original authors’ 
judgments on treatment effectiveness. However, the PEM scores had a higher correlation coefficient with 
the original authors’ judgments on treatment effects than the PND scores did. 


Table 1 . Inter-correlation between original authors ’ Judgment, the PND Scores, and the 
PEM Scores 


Variable 

PND 

PEM 

With pair as unit (n=694) 

Authors’ Judgments 

0.60^* 

0.68** 

PND 


0.71** 

With article as unit (n=106) 



Authors’ Judgments 

0.52** 

0.66** 

PND 


0.77** 


Note. The correlation coefficient between PND and PEM scores are Pearson 
correlation coefficients while others are Spearman correlation coefficients. 

p< .01. 

When the original authors judged an intervention as ineffective, the mean effect sizes by both 
measures were below 0.5. Both the mean PND score and the mean PEM score of the ineffective treatment 
confirmed the practical evaluation made by the original authors. However, in cases where the original 
authors’ considered interventions to be only partially effective, the mean PND score was 0.342 and the 
mean PEM score was 0.645. The mean PEM score of partially effective interventions was more close to the 
original authors’ practical evaluation than the mean PND score. The mean PEM score for effective 
interventions is higher than 0.9 while that of the PND is below 0.9. Results of this synthesis indicated that 
synthesis using the PEM approach is more congruent with the original authors’ practical evaluations than 
the PND approach even when the category “mildly or questionable effective” was integrated with the 
category “moderately effective”. 


Table 2. Mean Effect Sizes Categorized by Original Authors’ Judgments (n-694) 


Original Authors’ 

N 

Mean PND 

Mean PEM 

The criterion of 

Judgments 


Score (SD) 

Score (SD) 

Scruggs et al. 
(1986) 

2 (effective) 

1 (partially effective) 

507 (73.06%) 
37 (5.33%) 

0.73 (0.35) 
0.34 (0.40) 

0.92 (0.21) 
0.65 (0.32) 

>.9 
.5 .9 

0 (ineffective) 

150 (21.61%) 

0.09 (0.21) 

0.29 (0.36) 

<.5 


Proportion of orthogonal slope change and that of floor effect. When 694 effect sizes of each pair of 
baseline-treatment phases were analyzed, only 0.72% (n=5) of the pairs of baseline-treatment phases 
manifested orthogonal slope change. Orthogonal slope change was mostly near the zero level. However, 
14.41% (n=100) of baseline phases had floor data points. These results indicate that the floor effect had a 
more seriously disturbing influence on the computation of effect sizes with the PND approach. 
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Influence of Interventions on Effectiveness 

Overa// rreafmenf A lag- 1 autocorrelation of residuals (i.e., x — and x — Xp^^) 

analysis was conducted to determine whether the data set violated the basic assumption of independence. It 
resulted in significance, r (692) = .395, p < .001 for PND score and r(692) - .505, p < .001 for PEM score 
respectively. In addition, the number and variances of effect sizes of each independent variable were 
unequal, which may violate the basic assumption of homogeneity of variance. Hence, it is appropriate to 
use nonparametric statistics to analyze the effectiveness of treatments in this data set. 

However, when the effect sizes of an article was averaged to represent the effect size of that article, the 
lag-1 autocorrelation of residuals of 106 averaged effect sizes from each article was not significant, r(104) 
- .022, p > .05 for PND score and r (104) - .038, p > .05 for PEM score respectively. Hence, a t-test was 
conducted to determine whether the overall treatment effectiveness of 106 averaged scores was 
significantly different from the null hypothesis. 


The overall mean effect size of 106 averaged scores from each article was 0.64 (SD = 0.29) for the 
PND score and 0.84 (SD = 0.21) for the PEM score. The result of a t-test was significant, - 22.82, p 
< .001 and fjos" 16.22, p < .001 for the PND score and the PEM score respectively. These results indicate 
that all interventions for all kinds of disruptive behaviors had a significant effect. 

Effectiveness of Intervention. Table 3 shows that the strategies of differential reinforcement, the token 
economy system, and multi-components intervention were highly effective in the elimination of disruptive 
behaviors. A nonparametric statistical test using Kruskal-Wallis one-way analysis of variance by ranks 
(K-W ANOVA) reveals a significant difference between the mean ranks of different treatments, (8, N = 
613) = 34.15, p < .01, for PEM scores. 


Table 3. Results of Mean Effect Size, K-W AN OV AS, and Mann-Whitney U Tests by Intervention 


Intervention 

Nof 

effect 

sizes 

Mean PND Score (SD) 

Mean PEM Score (SD) 

I Differential reinforcement 

136 

0.72 (0.36) 

0.90 (0.24) 

2 Token economy system 

98 

0.67 (0.38) 

0.90 (0.21) 

3 Response cost 

29 

0.60 (0.38) 

0.88 (0.30) 

4 Token economy system plus 

12 

0.65 (0.47) 

0.90 (0.29) 

response cost 




5 Punishment 

79 

0.66 (0.40) 

0.85 (0.31) 

6 Providing preferential tasks 

66 

0.46 (0.44) 

0.73 (0.38) 

7 Instruction or training 

141 

0.56 (0.42) 

0.76 (0.34) 

8 Multi-components 

35 

0.75 (0.40) 

0.98 (0.06) 

intervention 




9 Other procedures 

17 

0.579 (0.33) 

0.847 (0.26) 

K-W ANOVA 

613 

X^(8,N=613)=26.80** 

X^(8,N=613)=34.15** 

Mann-Whitney U Test 

— 

1 >(6, 7)"; 2 >6“ 

(1,2, 5)>(6, 7)^ 



8>(2, 5, 6, 7, 9)" 

8 > (2, 5, 6, 7, 9) ^ 


Note. Through out the tables in the present study, the N (numbers) of effect sizes of PND scores are the 
same as that of the PEM scores; By the multiple post hoc comparisons of different variables using the 
Mann-Whitney U test, the numbers represent the serial number of each variable in the table. The 
numbers in the parentheses refer to the fact that there are no significant differences between the mean 
ranks of effect sizes of these variables by pair comparisons, and the mean ranks of those variables on the 
left side of “>” are significantly higher i.e., larger effect size, than that of those variables on the right 
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side. 

“ Significant at least at p < .05. 

p < .01. 

Effectiveness on different category of disruptive behaviors. As Table 4 shows, it was easier to 
eliminate the disruptive behaviors such as noise, orienting, and gross motor activities than to control 
composite of (multiple) disruptive behaviors, which covered about 72% of disruptive behaviors. 

Table 4. Results of Mean Effect Size, K-W ANOVAS, and Mann-Whitney U Tests by Disruptive Behaviors 


Behavior Class 

N 

Mean PND Score (SD) 

Mean PEM Score (SD) 

1 Aggression 

46 

0.62 (0.44) 

0.88 (0.27) 

2 Noise 

19 

0.93 (0.16) 

0.98 (0.01) 

3 Orienting 

14 

0.86 (0.25) 

0.93 (0.27) 

4 Gross motor activities 

33 

0.71 (0.32) 

0.95 (0.14) 

5 Verbalization 

80 

0.71 (0.35) 

0.87 (0.26) 

6 Composite 

502 

0.51 (0.43) 

0.72 (0.39) 

K-WANOVA 

694 

X" (5, A =694) = 6.44*^" 

X" (5, A =694)= 14.47**’ 

Mann-Whitney U Test 


2>(1,5,)“ 
(2,3,4, 5) >6“ 

(1,2, 3, 4, 5) >6“ 


Note. 

“Significant at least at p < .05. 


Influence of Study Characteristics on Effectiveness 


Table 5. Mean Effect Size by Study Characteristics, K-W ANOVAS, and Mann-Whitney U Tests 



N 

Mean PND score (SD) 

Mean PEM score (SD) 

1 Reversal design 

395 

Experimental design 
0.53 (0.44) 

0.71 (0.41) 

2 Multiple-baseline 

133 

0.61 (0.40) 

0.88 (0.25) 

design 

3 Reversal plus 

48 

0.73 (0.39) 

0.867 (0.31) 

multiple-baseline 

design 

4 Other design 

118 

0.56 (0.41) 

0.803 (0.30) 

K-WANOVA 

694 

X^ (3, N = 694) = 10.39** 

X^ (3, N = 694) = 16.22** 

Mann-Whitney U Test 

— 

3 >(1,2,4)“ 

(2, 3) > 1 “ 

1 School staff 

426 

Intervener 
0.60 (0.41) 

0.81 (0.32) 

2 Psychology 

223 

0.51 (0.45) 

0.65 (0.44) 

professional 

3 Caregiver 

13 

0.71 (0.43) 

0.79 (0.07) 

4 Dental clinic staff 

12 

0.65 (0.41) 

0.98 (0.05) 

5 Composite staff 

20 

0.69 (0.34) 

0.98 (0.09) 

K-WANOVA 

694 

X^ (4, N = 694) = 6.62 

X^ (4, N = 694) = 26.45*** 

Mann-Whitney U Test 

— 

— 

(1,3,4, 5) >2“ 

3 Composite 

81 

0.713 (0.34) 

0.91 (0.20) 
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4 Sex not specified 

78 

0.54 (0.42) 

0.80 (0.29) 

K-WANOVA 

616 

X2(2,N = 616) = 9.02* 

X2 (2, N = 616) = 8.03* 

Mann-Whitney U Test 

— 

3 >(1,2) a 

Grade level 

3 >(1,2) a 

1 Primary elementary 

323 

0.55 (0.43) 

0.72 (0.41) 

2 Secondary elementary 

158 

0.58 (0.42) 

0.78 (0.35) 

3 Junior high school 

113 

0.68 (0.41) 

0.83 (0.32) 

4 Senior high school 

40 

0.55 (0.40) 

0.86 (0.28) 

5 Composite 

51 

0.45 (0.45) 

0.80 (0.28) 

6 Adult 

9 

0.66 (0.41) 

0.89 (0.33) 

K-WANOVA 

694 

i (5, N = 694) = 10.36 
Diagnosis 

^ (5, N = 694) = 9.06 

1 Regular education 
student 

186 

0.62 (0.38) 

0.86 (0.25) 

2 Students with attention 
deficit hyperactivity 
disorder 

47 

0.57 (0.43) 

0.71 (0.42) 

3 Students with 
developmental or 
mental retardation 

149 

0.54 (0.43) 

0.76 (0.38) 

4 Students with 
emotional disturbance 

44 

0.67 (0.38) 

0.87 (0.24) 

5 Student with autism 

28 

0.40 (0.48) 

0.48 (0.50) 

6 Students with behavior 
problems 

37 

0.62 (0.44) 

0.82 (0.33) 

7 Students with brain 
damage 

14 

0.50 (0.52) 

0.62 (0.49) 

8 Students with language 
delay 

24 

0.31 (0.45) 

0.40 (0.49) 

9 Student with learning 
disability 

24 

0.71 (0.38) 

0.84 (0.34) 

10 Pre-delinquent 
students 

12 

0.08 (0.26) 

0.65 (0.40) 

1 1 Students with 
developmental or 
mental retardation and 
autism 

58 

0.54 (0.44) 

0.63 (0.43) 

12 Students with 
developmental or 
mental retardation and 
behavior problems 

19 

0.80 (0.23) 

0.96 (0.13) 

13 Other diagnoses 

52 

0.60 (0.44) 

0.85 (0.29) 

K-WANOVA 

694 

i (12, N = 694) = 38.88*** 

X^ (12, N = 694) = 47.10’ 

Mann-Whitney U Test 

— 

(1,9, 12)>5“; 

(1,2, 3, 4, 6, 12, 13) > (5 


(1,2, 3, 4, 6,9, 11,12, 13) > 

( 8 , 10 )^ 1 > 10 "; 

12>3^ (1,3,4, 6, 12 13) > 11 

(11, 9) >8^ 

12 >(1,3, 7, 10, 11)“ 
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The results of the effect of different moderators (study characteristics) on the effect sizes of treatments are 
presented in Table 5: 

Influence of experimental design. About 57% of the treatment phases used a reversal design (withdrawal 
design), 19.16% used a multiple baseline design; 6.92 % used a reversal plus multiple baseline design; and 
17% used an AB or other designs. The difference of mean ranks of different experimental designs appeared 
to be statistically meaningful according to a test with K-W ANOVA. It demonstrates that using the multiple 
baseline design or the multiple baseline design plus the reversal design is more likely to produce greater 
effectiveness of treatment than using the reversal design alone. 

Influence of intervener. The school staff was the major Among the interveners in 61.38% of 
interventions, psychology professionals in 32.13%, and caregivers in 1.87%. Dental clinic staff was 
involved as interveners in 1.73% of the studies. Composite staff was involved as interveners in about 2.88 
% of the studies. In pair wise comparisons, the treatment effect by psychology professionals as interveners 
was significantly lower than that given by other classes of interveners, as measured by the PEM approach. 
Further research is needed to determine whether the degree of familiarity which the intervener(s) and the 
subject have with one another had moderated the treatment effect. 

Influence of setting. It shows that 72.05% of interventions took place in classrooms; 13.83% in therapy 
rooms; 5.76% in institutes; and 2.88% in homes or in dental clinics. The overall difference of mean ranks of 
settings appeared to be statistically meaningful according to a K-W ANOVA. In pair wise comparisons, the 
treatment, which took place in a therapy room, produced a significantly lower effect than interventions, 
which took place in other types of settings for both the PND and PEM scores. 

Influence of sex. A few of the studies did not provide the subjects’ sex. This subcategory was not included 
in the statistical analysis. Therefore, there are only 2 degrees of freedom in thex^of K-W ANOVA. Results 
of studies, which included composite sex, were associated with stronger outcomes than that of studies 
including only male or female participants. However, the sex difference in the effectiveness of treatment on 
disruptive behaviors was not significant when single sex was recruited to take part in the experiment. 

Influence of Grade Level. No relation was observed between grade level and outcome effectiveness for 
either the PND or PEM scores. 

Influence of diagnosis. A relation can be observed between diagnosis and outcome effectiveness. The 
effectiveness of treatment on the disruptive behaviors of the students with language delay and those with 
autism showed to be the weakest (effect sizes were below .5). 

Discussion 


The Validity of PND and PEM approach 

A total of 106 single- subject research studies producing 694 effect sizes were included in the 
meta-analysis. With the original authors’ judgment on the effectiveness of treatment on disruptive behaviors 
as the validity criterion, the present study has confirmed that the validity of the PEM approach is superior to 
that of the PND approach, even when the category “mildly or questionable effective” was integrated into 
the category “moderately effective”. The most possible explanation for the difference between the PND 
approach and PEM approach is that the floor effect has a more seriously disturbing influence when using 
the PND method. Scholars have suggested that the crucial drawback of the PND method is floor or ceiling 
effects (Scruggs et al., 1987; Faith, Allison, & Gorman, 1997; Ma, 2006; Gao & Ma, 2006). When a 
baseline phase has floor or ceiling data points, the PND score will be zero, suggesting an ineffective 
treatment even if a specific intervention might be very effective. 
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When 694 effect sizes of each pair of haseline-treatment phases were analyzed, only 0.72% (n=5) of 
pairs of haseline-treatment phases manifested an orthogonal slope change. Hence, it imposes no threat to 
the validity of the PEM and PND approach. This result replicates the findings made hy Ma (2006) and Gao 
and Ma (2006). 

Other important findings over the effectiveness of interventions on the disruptive behaviors are that; (a) 
the overall grand mean effect size of 106 articles with each one having an averaged mean effect is 
significantly different from the null hypothesis of .5 for the PEM score; (h) the intervention strategies were 
effective on the elimination of disruptive behaviors, especially, the strategies of differential reinforcement, 
the token economy system, and the multi-components intervention were highly effective; (c) it was easier to 
eliminate those disruptive behaviors such as noise, orienting, and gross motor activities than to control 
composite of disruptive behaviors; (d) using the multiple baseline design or the multiple baseline design 
plus the reversal design is more likely to have a greater effectiveness of treatment than using the reversal 
design alone; (e) the treatment conducted by psychology professionals as interveners was significantly less 
effective than that carried out by other classes of interveners; (f) the treatment, which took place in a therapy 
room, produced a significantly less effect than interventions, which took place in other types of settings; (g) 
the sex difference in the effectiveness of treatment on disruptive behaviors was not significant when single 
sex was recruited to take part in experiments; (h) no relation was observed between a participant’s age and 
outcome effectiveness; (i) treatment on the disruptive behaviors of the students with language delay and 
autism shown to be less effective than that of students with other kinds of diagnosis. 

. The result that the treatments conducted by psychology professionals as interveners and the 
treatments implemented in a therapy room were least effective is contrary to the anticipation of the authors. 
Because the psychology professionals have expertise, therefore their intervention should be more effective 
than other kinds of agent. Whether the familiarity with the participant or the reinforcement history 
experienced with the participants having the impact on the effectiveness of intervention needs to be 
investigated in the future 
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